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Abstract 

We  show  that  there  is  significant  benefit  to  using  a 
reconfigurable  computer  to  enumerate  bent  Boolean  func¬ 
tions  for  cryptographic  applications.  Bent  functions  are 
rare,  and  the  only  known  way  to  generate  all  bent  functions 
is  by  a  sieve  technique  in  which  many  prospective  functions 
are  tested.  The  speed-up  achieved  depends  on  the  number 
of  variables  n;  for  n  =  8,  we  show  that  the  reconfigurable 
computer  achieves  better  than  a  60, 000  x  speed-up  over  a 
conventional  computer.  Further,  we  introduce  the  transeunt 
triangle  as  a  means  to  reduce  the  number  of  functions  that 
must  be  considered.  For  n  =  6,  this  reduction  is  better 
than  500,000,000  to  1. 

Previously,  the  transeunt  triangle  had  been  used  only 
in  the  design  of  exclusive  OR  logic  circuits;  it  converts  a 
truth  table  to  the  algebraic  normal  form.  However,  this  fact 
has  never  been  proven  rigorously,  and  that  shortcoming 
is  removed  in  this  paper.  Our  proof  provides  a  practical 
benefit;  it  yields  a  new  realization  of  the  transeunt  triangle 
that  has  less  complexity  and  delay.  Finally,  we  show 
computational  results  from  a  reconfigurable  computer. 


1.  Introduction 

Shannon  [18]  introduced  the  concepts  of  confusion  and 
diffusion  as  a  fundamental  technique  to  achieve  security  in 
cryptographic  systems.  The  confusion  principle  is  reflected 
in  the  nonlinearity  of  Boolean  functions,  since  most  linear 
systems  are  easily  breakable.  There  are  various  criteria 
that  imply  nonlinearity,  one  of  them  being  bentness.  Bent 
functions  were  first  introduced  by  Rothaus  in  1976  [15], 
as  functions  having  maximum  distance  away  from  the  set 
of  affine  functions. 

Bent  Boolean  functions  have  the  highest  nonlinearity 
possible,  which  makes  them  useful  in  the  design  of  block 
and  stream  ciphers.  Maximum  length  sequences  based  on 


bent  functions  have  cross-correlation  and  autocorrelation 
properties  that  are  close  to  the  ones  of  Gold  and  Kasami 
codes  [12],  which  have  applications  in  spread  spectrum 
communication  [6], 

While  we  can  mathematically  define  bent  functions 
precisely,  to  generate  them  it  is  a  different  matter.  One 
needs  sophisticated  mathematical  (like  invariant  theory) 
and  computational  tools  to  list  all  rc-variable  bent  functions 
(this  has  been  achieved  for  n  <  8).  Some  of  these  methods 
cannot  be  easily  parallelized,  and  do  not  offer  a  significant 
improvement  in  a  reconfigurable  environment. 

Using  the  SRC-6  reconfigurable  computer,  we  have 
tested  millions  of  Boolean  functions.  Specific  sets  of 
Boolean  functions  were  chosen  based  on  their  specific 
properties,  including  degree,  homogeneity,  and  symmetry. 
These  groups  were  evaluated  for  relationships  between 
nonlinearity  and  specific  properties.  The  objective  is  to 
find  groups  of  Boolean  functions  that  are  rich  in  bent 
functions  [1].  These  groups,  if  small  enough,  can  be  tested 
exhaustively.  Testing  across  the  entire  set  of  functions, 
even  for  small  numbers  of  variables,  e.g.,  n  =  6  or  more, 
is  infeasible  because  of  the  large  number  of  functions. 
The  use  of  the  transeunt  triangle  enables  functions  to  be 
generated  easily  in  one  form,  converted  to  another  form  and 
then  tested  for  certain  characteristics.  Without  the  transeunt 
triangle  [2],  [4],  important  groups  of  functions  could  not 
be  tested  efficiently. 

2.  Background  and  Definitions 

Definition  2.1.  A  Boolean  function  /  in  n  variables  is 
a  map  from  the  n-dimensional  vector  space  Yn  = 
to  F2,  the  two-element  field.  For  a  function  f,  let  fo  = 
/(0,  0, . . . ,  0),  A  =  /(0, 0, ....  1),  ...  ,  and  /2»_!  = 

/( 1, 1 . 1  ).TT=  ( f0  A  ...  /2 — r)  is  the  truth  table 

representation  of  /. 

Example  2.1.  /  =  X1X2X3X4  has  the  truth  table  rep¬ 
resentation  TT  =  (0  000000000000001).  g  = 
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X\ X2  ®  X3X4  has  the  truth  table  representation  TT  = 
(0  001000100011110).  (End  of  Example) 

Definition  2.2.  A  linear  function  is  the  constant  0  function 
or  the  exclusive-OR  of  one  or  more  variables.  An  affine 
function  is  a  linear  function  or  the  complement  of  a  linear 
function. 

Example  2.2.  There  are  16  linear  functions  on  4  variables, 

0,  X\,  X2,  #3,  X4,  X\  ®  #2,  X\  ®  #3,  X\  ®  X4,  X2  ®  #3, 
#2  ffi  24,  #3  ffi  #4,  #1  ffi  #2  ffi  #3,  #1  ffi  #2  ffi  #4,  #1  ffi  #3  ffi  #4, 
X2  ®  X3  ®  X4,  and  Xi  ®  X2  ®  X3  ®  X4.  These  functions 
and  their  complements  comprise  the  32  4-variable  affine 
functions.  (End  of  Example) 

Affine  functions,  when  used  in  encrypting  a  plaintext 
message,  are  susceptible  to  a  linear  attack.  We  seek 
functions  that  are  as  “far”  away  as  possible  from  affine 
functions. 

Definition  2.3.  The  Hamming  distance  d(f,  g)  between 
two  functions  f  and  g  is  the  number  of  places  where  their 
truth  table  representations  differ. 

Definition  2.4.  The  nonlinearity  NLf  of  a  function  f  is 
the  minimum  Hamming  distance  between  f  and  an  affine 
function. 

Example  2.3.  /  =  X4X2X3X4  has  nonlinearity  1,  since 
converting  the  single  1  to  a  0  in  its  truth  table  representa¬ 
tion  creates  the  truth  table  representation  of  the  constant  0 
function,  which  is  affine,  g  =  #i#2  ®  X3X4  has  a  distance 
6  or  10  from  any  affine  function.  Thus,  its  nonlinearity  is 
6.  (End  of  Example) 

Definition  2.5.  Let  f  be  a  Boolean  function  on  n-variables, 
where  n  is  even,  f  is  a  bent  function  if  its  nonlinearity  is 
maximum  among  n-variable  functions. 

Example  2.4.  Rothaus  [15]  showed  that  bent  functions 
have  nonlinearity  2n~1~ 2^  _1.  Thus,  f  =  #i#2#3#4  is  not 
bent  (NLf  =  1),  and  g  =  #i#2®#3#4  is  bent  (NLg  =  6). 

(End  of  Example) 

The  property  of  “bentness”  depends  on  the  function’s 
truth  table  representation.  However,  another  representation 
provides  alternative  insight  into  bentness  and  allows  a 
reduction  in  the  number  of  functions  that  must  be  searched 
during  bent  function  discovery. 

Definition  2.6.  The  algebraic  normal  form  (ANF)  of  a 

function  f  is  f  =  Yaer2  C*XTX22  ■  ■  ■  where  Y  is 
the  exclusive  OR  sum,  a  =  (01,  a2,  ■  ■  ■ ,  an),  ca,ai  £  F2, 
X®  =  1,  and  x\  =  x  *.  ANF  =  (cqC\  ...  c-2" .  j)  is  the 

ANF  representation  of  f. 

Example  2.5.  /  =  X1X2X3X4  has  the  ANF  f  = 
X1X2X3X4  and  the  ANF  representation  ANF  = 
(0  000000000000001).  g  =  X\X2  ®  X3X4  has  the 


ANF  g  =  X1X2  ®  X3X4  and  the  ANF  representation 
ANF  =  (0  00100000000100  0).  (End  of  Example) 

Definition  2.7.  The  degree  of  a  product  term  is  the 

number  of  variables  in  that  term.  The  degree  of  a  function 
/  is  the  maximum  of  the  degrees  among  the  product  terms 
in  the  ANF  of  f. 

Example  2.6.  /  =  #12:22:32:4  has  degree  4  and  g  =  2:12:2® 
2:32:4  has  degree  2.  (End  of  Example) 

Definition  2.8.  Functions  f  and  h  belong  to  the  same 
affine  class  if  and  only  if  f  =  h®  a,  where  a  is  an  affine 
function. 

Example  2.7.  /  =  #12:22:32:4,  a  non-bent  function,  belongs 
to  an  affine  class  of  32  functions,  g  =  #i#2  ®  #3#4,  a 
bent  function,  belongs  to  an  affine  class  of  32  functions. 

(End  of  Example) 

Certainly,  each  affine  class  contains  the  same  number 
of  functions,  namely  2n+1.  Also,  all  functions  in  the  same 
affine  class  as  a  non-affine  function  /  have  the  same 
degree. 

Definition  2.9.  A  function  f  is  homogeneous  of  degree  d 
if  and  only  if  all  terms  in  the  ANF  of  f  have  degree  d. 

Example  2.8.  /  =  #i#2#3#4  is  homogeneous  of  degree 
4,  and  g  =  #i#2  ffi  #3#4  is  homogeneous  of  degree  2. 

(End  of  Example) 

Xia,  Seberry,  Pieprzyk,  and  Charnes  [20]  considered 
homogeneity  in  the  context  of  bent  functions  and  showed 
the  next  result. 

Theorem  2.1.  When  n  >  6,  no  n-variable  homogeneous 
bent  function  has  degree 

Because  of  /  =  #i#2  ffi  #3#4,  Theorem  2.1  does  not 
hold  for  n  =  4.  Qu,  Seberry,  and  Pieprzyk  [14]  found  30 
homogeneous  6-variable  bent  functions  of  degree  3,  and  so, 
Theorem  2.1  does  not  hold  for  n  =  6.  Therefore,  from  [14], 
[20],  for  n  >  6,  degree-^  n-variable  bent  functions  exist, 
but  none  are  homogeneous.  More  recently,  Meng  et  al.  [11] 
showed  (purely  combinatorially)  that,  for  any  nonnegative 
integer  k,  there  exists  a  positive  integer  N,  such  that  for 
n  >  N,  there  do  not  exist  2 n  variable  homogeneous  bent 
functions  having  degree  n—k  or  more,  where  N  is  the  least 
integer  satisfying  2N~l  >  (N+1)  +  H - b  (^+i)- 

3.  Architecture  of  Bent  Function  Enumerator 

A  reconfigurable  computer  allows  one  to  adapt  the 
architecture  to  the  problem.  Fig.  1  shows  the  architecture 
to  enumerate  bent  functions  based  on  the  ANF  of  the 
tested  functions.  This  and  other  variations  yield  the  data 
we  present  later.  In  all  cases,  a  counter  was  used  to 


enumerate  prospective  functions.  This  is  shown  on  the  left. 
This  is  applied  to  a  block  labeled  Transeunt  Triangle.  In 
this  case,  the  counter  enumerates  ANFs;  each  bit  of  the 
counter  determines  the  presence  or  absence  of  a  term  in  the 
ANF.  The  transeunt  triangle  produces  the  corresponding 
truth  table.  This  is  then  applied  to  a  block  that  computes 
the  function’s  nonlinearity,  NL.  If  NL  is  maximum,  the 
function  is  bent,  and  it  is  stored. 


Fig.  1.  Bent  function  enumeration  circuit 


In  the  SRC-6  reconfigurable  computer,  this  circuit  is 
implemented  on  a  Xilinx  Virtex2  Pro  FPGA.  It  is  pipelined 
and  runs  at  100  MHz.  Specifically,  one  function  is  tested 
every  clock  cycle.  We  used  this  to  enumerate  all  6-variable 
bent  functions  [16].  If  we  had  to  enumerate  all  22  = 

1.85  x  1019  6-variable  functions,  this  would  take  5,849 
years.  However,  Rothaus  [15]  showed  that  no  bent  function 
has  degree  greater  than  By  eliminating  functions  with 
degree  greater  than  it  is  only  necessary  to  enumerate 
2(3)+(2)+(i)+(o)  =  242  functions.  A  function  in  an  affine 
class  is  bent  if  and  only  if  all  functions  in  the  same  affine 
class  are  bent.  As  a  result,  the  number  of  bent  functions  is 
found  by  multiplying  the  number  of  affine  classes  by  the 
number  of  functions  in  each  class,  2(1)+(°)  =  27  =  128. 
The  number  of  affine  classes  with  degree  3  or  less  is 
2(3)+(2)  =  235.  At  one  class  (function)  per  100  MHz  clock 
period,  this  enumeration  takes  only  5.7  minutes  plus  0.5 
minutes  for  data  transfer  for  a  total  of  6.2  minutes.  That 
is,  by  enumerating  only  the  affine  classes  corresponding  to 
functions  of  degree  3  or  less,  we  achieve  a  reduction  of 
259  =  536  870  912-  However,  this  requires  that  we  quickly 
convert  between  the  ANF  of  a  function  and  its  truth  table. 
For  this,  we  need  the  Transeunt  Triangle  of  Fig.  1,  which 
we  discuss  in  the  next  section. 

Fig.  2  shows  the  circuit  that  realizes  the  Nonlinearity 
block  of  Fig.  1.  The  truth  table  representation  of  the 
function  /  is  applied  on  the  left  to  2n+1  sets  of  exclusive 
OR  gates  to  compute  2n+1  distance  vectors.  The  number 
of  l’s  in  these  vectors  is  the  distance  from  /  to  each 
affine  function.  The  OnesCount  circuit  produces  a  binary 
number  that  is  the  distance  between  /  and  an  affine 


function.  Then,  a  Minimum  circuit  computes  the  overall 
minimum  distance.  This  is  the  nonlinearity  NL. 


Both  the  Ones_Count  and  the  Minimum  circuit  in 
Fig.  2  are  trees.  Fig.  3  shows  that,  in  the  case  of  the 
Ones  Count  circuit,  adders  of  various  sizes  form  the 
circuit.  Fig.  4  shows  that,  in  the  case  of  the  Minimum 
circuit,  two-input  one-output  minimum  circuits  are  used. 


The  part  of  the  circuit  in  Fig.  1  that  has  the  ANF  as  input 
and  the  ’’store”  signal  as  output  is  combinational.  However, 
its  delay  is  larger  than  the  SRC-6’s  100  MHz  clock  period, 
and  so,  it  is  pipelined.  For  n  =  6,  the  pipeline  stages  are 
shown  in  Table  1. 


TABLE  1.  Function  of  each  pipeline  stage 


Stage 

Circuit 

Description 

1 

Transeunt  Triangle  (Fig.  1) 

2 

EXOR  gates  (Fig.  2) 

3 

Ones_Count  (Figs.  2  &  3) 

16  bits  =>  4  partial  sums 

4 

Ones_Count  (Figs.  2  &  3) 

4  partial  sums  =^>  1  sum  out 

5 

Minimum  (Figs.  2  &  4) 

128  words  in  =>  32  words  out 

6 

Minimum  (Figs.  2  &  4) 

32  words  in  =>>  8  words  out 

7 

Minimum  (Figs.  2  &  4) 

8  words  in  =>  2  words  out 

8 

Minimum  (Figs.  2  &  4) 

2  words  in  =>  1  word  out 

Fig.  4.  Minimum  circuit 

All  of  this  is  implemented  on  the  FPGA  and  is  described 
in  Verilog.  The  counter  that  produces  the  ANF  in  Fig.  1 
is  implemented  in  C  code  that  is  compiled  into  a  circuit 
on  the  FPGA.  This  and  overhead  circuitry  require  6  more 
pipeline  stages.  Therefore,  there  are  a  total  of  14  stages. 

As  discussed  earlier,  there  are  235  =  3.4  x  1010 
iterations.  Therefore,  the  14-clock  latency  is  miniscule  in 
comparison  to  the  total  computation  time.  Even  reducing 
the  latency  to  0  (no  pipeline)  would  yield  no  perceptible 
reduction  in  computation  time.  The  above  discussion  ap¬ 
plies  to  the  bent  function  enumeration  that  is  described  in 
Section  5.2.  Other  enumerations,  such  as  the  distribution 
of  nonlinearity  of  8-variable  rotation  symmetric  Boolean 
functions  described  in  Section  5.4,  for  example,  correspond 
to  somewhat  different  circuits  (e.g.  do  not  use  the  transeunt 
triangle).  However,  the  same  conclusion  holds;  the  latency 
has  an  imperceptible  affect  on  the  computation  time. 

An  examination  of  Figs.  1-4  reveals  why  a  reconfig- 
urable  computer  is  much  more  efficient  than  a  conven¬ 
tional  computer  in  computing  bent  Boolean  functions.  The 
OnesCount  circuit  requires  many  small  adders  that  can  be 
used  simultaneously.  An  FPGA  can  realize  these,  albeit  at 
an  increased  delay,  compared  to  a  conventional  computer. 
A  conventional  computer  has  only  a  few  large  wordwidth 
adders.  The  large  wordwidth  is  not  used  efficiently.  Simi¬ 
larly,  the  Minimum  circuit  requires  many  comparators  that 
can  be  used  simultaneously  on  an  FPGA,  but  are  much  less 
abundant  on  a  conventional  computer. 

4.  The  Transeunt  Triangle 

4.1.  Definition 


truth  table  of  a  given  function  and,  in  so  doing,  produce 
compact  exclusive  OR  sum-of-products  circuits.  In  this 
paper,  we  show  the  benefit  of  the  transeunt  triangle  in 
a  computational  application.  Not  only  can  the  ANF  be 
computed  from  the  truth  table,  but  the  truth  table  can  be 
computed  from  the  ANF  by  using  the  same  algorithm.  This 
yields  a  significant  computational  advantage. 

Definition  4.10.  The  transeunt  triangle*  is  a  set  of 

2n  —  1  rows  of  adjacent  2-input  1 -output  exclusive-OR 
gates,  where  adjacent  exclusive-OR  gates  connect  to  the 
same  point.  The  input  is  a  set  of  2n  binary  values  that 
applies  to  the  first  (bottom)  row  of  the  triangle.  That  is,  the 
inputs  connect  to  a  row  of  2n  —  1  adjacent  exclusive-OR 
gates,  whose  outputs  connect  to  a  row  of  2"  —  2  adjacent 
exclusive-OR  gates,  etc..  The  apex  of  the  transeunt  triangle 
is  a  row  of  just  one  exclusive-OR  gate.  The  output  of  the 
transeunt  triangle  consists  of  the  leftmost  input  bit  and 
the  outputs  of  the  leftmost  gates  in  each  row.  The  inputs 
are  indexed  by  the  binary  tuples  00  . . .  000,  00  . . .  001, 
00  . .  .  010,  . .  .,  and  11 . . .  Ill  from  left  to  right.  Similarly, 
the  outputs  are  indexed  from  the  lower  left  corner  to  the 
apex  by  the  binary  tuples  00  . . .  000,  00  ...  001,  00  ...  010, 

. . .,  and  11 . . .  111. 

Example  4.9.  Fig.  5a,  shows  the  transeunt  triangle  for 
n  =  3.  In  this  case,  there  are  eight  inputs  and  eight  outputs. 


TABLE  2.  Example  transeunt  triangle  in/out 


X\X2XZ  — 

000 

001 

010  011  100 

101 

110 

111 

Output  ANF  Repr.  = 

(0 

1 

1  0  1 

0 

0 

0) 

Output  Expression 

X3  ®  X2  ©  X\ 

Input  TT  Repr.= 

(0 

1 

1  0  1 

0 

0 

1) 

Input  Expression 

X\X2XS V  X1X2X3 V  X\X2X3  V  X1X2X3 

Output  TT  Repr.= 

(0 

1 

1  0  1 

0 

0 

0) 

Output  Expression 

X1X2X3  V  0:10:23:3  V  3:1X23:3 

Input  ANF  Repr.  = 

(0 

1 

1  0  1 

0 

0 

1) 

Input  Expression 

X3  ®  X2  ®  Xl  ®  X1X2X3 

Table  2  shows  an  example  of  the  output  values  for 
given  input  values.  Specifically,  if  the  input  truth  table 
representation,  TT  =  (01101001),  which  corresponds 
to  the  minterm  canonical  form  xiX2X%\/  X\X2X,s\/  X1X2X3  V 
X1X2X3,  is  applied  to  the  bottom,  then  the  left  side  of 
the  transeunt  triangle  corresponds  to  the  output  ANF 
representation  of  this  function,  ANF  =  (01  101000), 
which  is  X3  ®X2®X\.  Conversely,  Table  2  also  shows  that 
if  the  input  ANF  representation,  ANF  =  (01101001), 
is  applied  to  the  bottom,  then  the  truth  table  representation 
of  that  function,  TT  =  (01101000),  is  produced  on  the 
left  side  of  the  transeunt  triangle.  (End  of  Example) 


Green  [9]  and  others  [2],  [3],  [4],  [8],  [19]  propose  the 
transeunt  triangle  as  a  means  to  derive  the  ANF  from  the 


*.  Green  [9]  and  others  [4]  define  the  transeunt  triangle  to  be  the  logic 
values  at  the  inputs  and  outputs  of  the  2-input  1 -output  exclusive-OR 
gates.  We  define  it  to  be  a  circuit  of  exclusive-OR  gates. 


4.2.  The  Transeunt  Triangle  Proof 

‘‘i  ‘‘i 


a)  Full  Transeunt  Triangle  b)  Reduced  Transeunt  Triangle 

Fig.  5.  Comparing  the  full  and  reduced  transe¬ 
unt  triangle  for  n  =  3. 

Green  [9]  did  not  prove  that  the  transeunt  triangle  con¬ 
verts  a  truth  table  representation  to  an  ANF  representation. 
We  do  so  now.  The  following  result  from  [10,  p.  68]  will 
be  used  in  our  proof. 

Theorem  4.2  (Lucas).  Let  p  be  a  prime  number,  and  two 
integers  represented  in  base  p,  namely  n  =  nsps  +  ■  ■  ■  + 
nip1  +  no  and  r  =  rsps  +  -  ■  •  +  rip1+rn,  with  0  <  < 

The  main  result  of  this  section  is  as  follows. 

Theorem  4.3.  If  the  input  to  the  transeunt  triangle  is  the 
truth  table  representation  of  an  n-variable  function  f,  then 
the  output  is  the  ANF  representation  of  f.  Conversely,  if  the 
input  to  the  transeunt  triangle  is  the  ANF  representation 
of  an  n-variable  function  f ,  then  the  output  is  the  truth 
table  representation  of  f. 

Proof:  The  second  statement  follows  from  the  first  because 
the  logic  values  in  the  transeunt  triangle  are  unchanged  if 
all  exclusive-OR  gates  are  rotated  120  degrees  clockwise 
(thus  exchanging  the  input  with  the  output).  We  prove  the 
first  statement  by  induction. 

Fig.  6a  shows  that  the  first  statement  is  true  for  all 
functions  on  n  =  1  variable. 

Assume  the  first  statement  is  true  for  n,  and  consider 
an  n  +  1-variable  transeunt  triangle.  Fig.  6b  shows  that 
there  are  two  n-variable  transeunt  triangles  embedded  in 
this  transeunt  triangle.  Applied  as  an  input  to  the  lower 
one  is  fo^Xl,  shown  as  fo  in  Fig.  6b.  By  the  inductive 
assumption,  the  output  of  this  transeunt  triangle  is  the  ANF 
representation  of  fo^Xl . 

We  now  show  that  fo^Xl  ©/i_>.Xl  is  applied  as  an  input 
to  the  upper  transeunt  triangle.  Let  a  be  an  assignment  of 
values  to  X2,  £3,  ■  ■  ■,  and  xn.  Then,  each  input  to  the  upper 
triangle  is  driven  by  a  (2n"1  +  l)  x  (2n"1  +  l)  x  (2rl"1  +  l) 


1-Variable  Functions  Triangle  Decomposition 

Fig.  6.  Transeunt  triangle  composition 

transeunt  triangle  whose  inputs  assignments  range  from  Oo: 
through  la.  This  is  shown  in  Fig.  6b  as  a  dotted-line  trian¬ 
gle.  For  example,  the  left  input  is  driven  by  a  transeunt  tri¬ 
angle  whose  2n_1  + 1  inputs  are  00  . . .  000,  00  . . .  001,  . . ., 
01 . . .  Ill,  and  10  . . .  000  where  a  =  0  . . .  000.  Consider 
one  triangle,  and  index  its  inputs  by  i,  for  0  <  i  <  2n~1. 
The  output  of  this  triangle  is  the  exclusive-OR  of  some 
number  of  its  inputs.  The  number  of  times  an  assignment 
appears  in  the  exclusive-OR  expression  of  the  inputs  is 
the  number  of  paths  from  that  input  to  the  output.  This 
is  just  (2  .  ).  For  i  =  0  and  i  =  2n_1,  (2  i  )  =  1;  i.e. 
there  is  exactly  one  path  to  the  triangle’s  output,  and  these 
two  inputs  appear  once  in  the  exclusive-OR  expression. 
Consider  i,  such  that  0  <  i  <  2n~1.  We  use  Theorem  4.2. 

Since  n  =  2fc_1,  ni  =  0  for  all  i,  except  that  nk- 1  =  1. 
For  0  <  r  <  n  =  2fc_1,  there  is  at  least  one  j  such 
that  (nf)  =  (°)  =  0.  Thus,  for  0  <  r  <  n  =  2fc"\ 

(  r  )  =  0  (mod  2).  Thus,  the  number  of  paths  from  any 
assignment  of  values  in  the  truth  table  input  to  the  root  is 
even.  It  follows  that  the  only  terms  that  occur  are  0a  and 
la.  We  can  conclude,  therefore,  that  the  input  to  the  upper 
transeunt  triangle  in  Fig.  6b  is  the  truth  table  representation 
of  fo_>Xl  ©  /i^Xl,  shown  in  this  figure  as  /0  ©  /1. 

By  the  inductive  hypothesis,  the  output  of  the  upper 
transeunt  triangle  is  the  ANF  representation  of  /o->Xl  © 
/i_>Xl .  The  input  to  the  n  +  1-variable  transeunt  triangle 
in  Fig.  6b  is  the  truth  table  representation  of  f0^Xlx  1  V 
fi^XlX\.  The  output  is  the  ANF  representation  of  fo^Xl  © 
(/o— .xi  ©  fi^Xl)x\,  which  represents  the  same  function.  ■ 

4.3.  Reduced  Transeunt  Triangle 

We  note  that,  in  Fig.  6b,  only  one  of  the  dotted- 
line  triangles  embeds  a  transeunt  triangle  (left  dotted-line 
triangle).  That  is,  all  but  one  of  these  triangles  can  be 
replaced  by  a  single  2-input  1 -output  exclusive-OR  gate. 
Doing  this  yields  the  reduced  transeunt  triangle.  Fig.  5b 
shows  the  reduced  transeunt  triangle  for  n  =  3.  In  this  case, 
only  12  2-input  1-output  exclusive-OR  gates  are  needed. 


compared  to  28  gates  for  the  full  transeunt  triangle. 

Definition  4.11.  A  transeunt  triangle  is  balanced  if  for 
every  output  f,  the  path  length  to  all  inputs  on  which  f 
depends  is  the  same. 

Example  4.10.  Both  the  full  and  reduced  transeunt  trian¬ 
gles  are  balanced.  A  transeunt  triangle  in  which  all  outputs 
are  driven  by  a  cascade  of  2-input  1 -output  exclusive  OR 
gates  is  not  a  balanced  transeunt  triangle. 

Lemma  4.1.  The  full  transeunt  triangle  for  n-variable 
functions  requires  (2™  — 1)2"_1  2-input  1 -output  exclusive- 
OR  gates,  while  the  reduced  transeunt  triangle  requires 
n2n~1,  which  is  the  smallest  possible  among  all  balanced 
transeunt  triangles  using  only  2-input  1 -output  exclusive- 
OR  gates. 

Proof:  The  number  of  gates  in  the  full  transeunt  triangle 
is/n  =  l  +  2  +  3  +  --  -  +  2n  —  1=  2  (y22 — — .  The  number 
of  gates  rn  in  the  reduced  transeunt  triangle  is  given  by 
the  recurrence  relation  rn  =  2r„_i  +  2n~1,  with  initial 
condition  r i  =  1.  Solving  yields  rn  =  n2n~1.  The  fact 
that  this  is  the  smallest  possible  can  be  seen  as  follows. 

Order  the  inputs  so  that  they  are  in  lexicographical 
order,  00  . . .  00,  00  . . .  01,  . . .,  and  11 . . .  11,  and  construct 
a  minimal  balanced  transeunt  triangle  so  that  the  outputs 
are  in  lexicographical  order.  Each  output  bit  indexed  by 
oi  02  ...  on  is  the  exclusive  OR  of  all  input  bits  indexed 
by  ii,  *2)  ‘.-in,  such  that  ij  <  oj.  For  example,  output 
bit  00  ...  00  is  driven  by  input  bit  00  . . .  00,  and  no  gate 
is  needed.  Output  bit  00 ...  01  is  driven  by  a  gate  with 
input  bits  00  ...  00  and  00  . . .  01,  and  one  gate  is  needed. 
Specifically,  each  output  bit  is  the  root  of  a  full  binary 
tree,  where  the  leaves  are  driven  by  input  bits  whose 
index  is  the  same  as  the  output  node’s  index  where  some 
l’s  may  be  changed  to  0’s.  Input  bit  00 ...  00  is  in  the 
binary  tree  of  every  output.  Let  wt(o-\  02  ■  ■  ■  on)  be  the 
Hamming  weight  of  01O2  . . .  on  (the  number  of  of  s  that 
are  1).  Consider  the  number  of  nodes  added  to  the  transeunt 
triangle  constructed  so  far  by  output  bit  O1O2  . . .  on.  The 
fewest  nodes  added  are  those  in  the  binary  tree  associated 
with  output  bit  01O2  . . .  on  that  are  along  a  path  from  the 
output  bit  0102  . . .  on  to  the  input  bit  *1*2  ■  ■  ■  in,  such  that 
ij  =  Oj,  for  all  1  <  j  <  n.  None  of  these  nodes  are 
part  of  the  transeunt  triangle  constructed  so  far.  Because 
the  transeunt  triangle  is  balanced,  there  are  wt(oio-2  ■  ■  ■  on ) 
added  nodes.  Because  of  the  lexicographical  order  of  the 
inputs,  all  arcs  from  these  nodes  must  go  toward  that  part 
of  the  transeunt  triangle  constructed  so  far. 

It  follows  that  the  number  of  gates  in  a  balanced 
transeunt  triangle  is  bounded  below  by  the  total  number 
of  l’s  among  all  binary  n-tuples,  which  is  n2"_1. 

I 

In  addition,  the  reduced  transeunt  triangle  yields  smaller 


delay  than  the  full  transeunt  triangle.  It  is  straightforward 
to  show  the  following. 

Lemma  4.2.  The  full  transeunt  triangle  for  n-variable 
functions  requires  2n  —  1  gate  delays,  while  the  reduced 
transeunt  triangle  requires  n  gate  delays,  where  one  gate 
delay  is  the  delay  associated  with  a  2-input  1 -output 
exclusive-OR  gate. 

Since  the  full  and  reduced  transeunt  triangles  are  balanced, 
the  delay  to  an  output  from  any  of  the  inputs  is  identical. 

5.  Experimental  Results 

5.1.  Speed-up  Achievable  by  the  Reconfig- 
urable  Computer 

We  compare  the  computation  time  required  by  an  SRC- 
6  reconfigurable  computer  with  the  time  required  by  a 
conventional  computer.  In  our  case,  this  is  an  Intel  Xeon 
processor  running  at  2.8  GHz.,  which  is  one  of  two 
conventional  microprocessors  associated  with  the  SRC -6. 
The  program,  written  in  C,  computes  the  nonlinearity  of  71- 
variable  functions,  forming  the  distribution  of  functions  to 
nonlinearity.  Similarly,  the  time  it  takes  to  do  the  same 
calculation  on  the  SRC-6  can  be  calculated  since  the 
throughput  is  one  function  per  clock  period.  The  results 
are  shown  in  Table  3. 


TABLE  3.  Speed-up  obtained  by  the  SRC-6 
reconfigurable  computer 


n 

PC  Compute 
Time 

(@2.8  GHz.) 

SRC-6  Compute 
Time 

(@100  MHz) 

Speed-up 

Factor 

2 

6.38  fisec. 

0.16  (i sec. 

39.9  X 

3 

457.0  fisec. 

2.56  fisec. 

178.5  x 

4 

0.388  sec. 

655.4  /i sec. 

592.0  x 

5 

25.338  hours 

42.9  sec. 

2,126.3  x 

6 

39,807,788  years 

5.840  years 

6,805.9  x 

7 

2.05  X  1027  years 

1.08  X  1023  years 

19,005  x 

8 

2.28  X  1066  years 

3.67  X  1061  years 

62,111  x 

Speed-up  factors  range  from  39.9  x  for  n  =  2  to 
62,111  x  for  n  =  8.  Note  that  the  speed-up  factor  should 
nearly  quadruple  for  each  increase  in  n  by  1 .  On  the  PC,  the 
computation  time  doubles  for  each  increase  in  n  because 
the  number  of  affine  functions  doubles.  Similarly,  the 
number  of  Ones  Count  operations  also  doubles.  However, 
on  the  SRC-6,  the  circuit  size  increases;  the  throughput 
of  one  function  per  clock  cycle  remains  the  same.  The 
computation  times  for  2  <  n  <  5  shown  in  Table  3  were 
achieved  by  programs  that  enumerated  all  22  n-variable 
functions.  The  computation  times  for  6  <  n  <  8  for  the  PC 
were  obtained  by  running  the  C  program  over  a  fraction  of 


the  functions  and  then  prorating  to  compute  the  time  had 
all  functions  been  enumerated.  Although  the  computation 
time  on  the  SRC -6  for  these  values  of  n  is  much  less, 
it  is  still  excessive,  and  this  computation  could  not  be 
done.  However,  the  speed-up  applies  when  we  enumerate 
a  sufficiently  small  subset  of  all  functions.  For  example, 
we  enumerated  all  6-variable  functions  with  degree  3 
or  less  and,  in  so  doing,  enumerated  all  bent  functions 
[16]  using  the  theorem  by  Rothaus  [15].  As  discussed  in 
Section  3,  this  computation  required  6.2  minutes.  Had  this 
computation  been  done  on  the  PC,  it  would  have  taken 
6805.9 x  (5.7  mins.)  longer  or  27  days.  We  achieved  the 
62,111  speed-up  associated  with  n  =  8  in  Table  3  in 
computing  the  distribution  of  rotation  symmetric  functions, 
as  described  in  Section  5.4. 

5.2.  Number  of  6- Variable  Bent  Functions 

The  computation  described  in  the  previous  section  veri¬ 
fied  PreneeTs  [13]  result  that  there  are  5,425,430,528  bent 
functions  on  6  variables.  We  showed  further,  that  1,777,664 
of  these  functions  or  0.03%  have  degree  2.  All  of  the 
remaining  have  degree  3.  Table  4  shows  the  resource  usage 
on  the  Xilinx  Virtex2  Pro. 

TABLE  4.  Resources  used  to  compute  the 
nonlinearity  of  6-variable  functions  of  degree 
2  and  3 


Number  of 

Number/Total 

Percentage 

Slice  Flip-Flops 

6.522/88,192 

7% 

4-Input  LUTs 

8.997/88,192 

10% 

Occupied  Slices 

6,450/44,096 

14% 

5.3.  Nonlinearity  of  6- Variable  Homoge¬ 
neous  Boolean  Functions 

In  the  search  for  trends  in  Bent  function  properties, 
it  is  useful  to  examine  the  nonlinearity  distribution  of 
homogeneous  Boolean  functions.  There  are  ]C)’_0(2  W  — 
1)  =  1, 114,  237  6-variable  homogeneous  functions.  Fig. 
7  shows  the  distribution  of  6-variable  homogeneous  func¬ 
tions  to  nonlinearity  and  degree,  as  computed  on  the  FPGA. 
The  vertical  axis  shows  the  log2  number  of  functions.  For 
example,  there  are  63  homogeneous  functions  of  nonlin¬ 
earity  0  and  degree  1;  these  are  the  linear  functions. 

The  bent  functions  have  nonlinearity  28,  and  Fig.  7 
shows  there  are  two  different  degrees.  13,888  have  degree 
2  and  30  have  degree  3.  The  next  largest  nonlinearity  is  23, 
and  again,  functions  exist  with  only  degrees  2  and  3.  For 
degrees  3,  4,  and  5,  there  is  bell-like  distribution  across 
nonlinearity.  This  information,  when  combined  with  the 


same  data  for  higher  n,  could  lead  to  further  reduction  in 
the  number  of  test  functions  resulting  in  the  ability  to  find 
more  bent  functions  without  increasing  computation  time. 
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Fig.  7.  Distribution  of  homogeneous  6- 
variable  functions  by  nonlinearity  and  degree 


Table  5  shows  the  resource  usage  on  the  Xilinx  Virtex2 
Pro. 


TABLE  5.  Resources  used  to  compute  the 
nonlinearity  of  6-variable  homogeneous  func¬ 
tions 


Number  of 

Number/Total 

Percentage 

Slice  Flip-Flops 

7.959/88,192 

9% 

4-Input  LUTs 

12,335/88,192 

13% 

Occupied  Slices 

8,724/44,096 

19% 

5.4.  Nonlinearity  of  8-Variable  Rotation 
Symmetric  Boolean  Functions 

Definition  5.12.  A  function  f  is  rotation  symmetric  if  and 
only  if  for  any  (x-y,  x2,  ■  ■  • ,  xn)  £  F2, 

f(xi,x2,  x3,...,  xn)  =  f(xn,  Xy,X2,...  xn_i). 

In  a  rotation  symmetric  function,  “rotating”  an  assignment 
of  values  to  the  variables  leaves  the  function  unchanged. 
Rotation  symmetric  functions  have  interesting  properties 
[6]  and  there  is  evidence  to  suggest  that  this  class  is  rich 
in  bent  functions.  It  is  conjectured  [7]  that  the  weight 
and  nonlinearity  of  a  third  degree  homogeneous  rotation 
symmetric  function  are  identical. 

Fig.  8  shows  the  distribution  of  8-variable  rotation 
symmetric  functions  to  nonlinearity.  This  shows  that  more 
rotation  symmetric  functions  have  nonlinearity  around  110 
than  other  values.  Relatively  few  have  low  nonlinearity 
(0  -  75)  or  high  nonlinearity  (>  113).  This  distribution 
resembles  the  distribution  of  nonlinearity  to  all  functions, 
which  is  known  only  for  n  =  4  [1],  Table  6  shows  the 
resource  usage  on  the  Xilinx  Virtex2  Pro. 
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Fig.  8.  Distribution  of  8-variable  rotation  sym¬ 
metric  functions  by  nonlinearity 


TABLE  6.  Resources  used  to  compute  the 
nonlinearity  of  8-variable  rotation  symmetric 
functions 


Number  of 

Number/Total 

Percentage 

Slice  Flip-Flops 

9,531/88,192 

10% 

4-Input  LUTs 

8,850/88,192 

10% 

Occupied  Slices 

8,540/44,096 

19% 

6.  Concluding  Remarks 


We  show  that  the  reconfigurable  computer  is  an  effective 
research  tool  in  bent  function  discovery.  Because  we  adapt 
the  architecture  to  the  problem,  we  achieve  significant 
efficiencies.  Indeed,  we  show  that  a  reconfigurable  com¬ 
puter  can  achieve  better  than  a  60,000 x  speed-up  over  a 
conventional  computer  for  8-variable  functions.  The  imple¬ 
mentation  of  the  transeunt  triangle  is  beneficial  in  reducing 
the  number  of  functions  through  which  we  must  sieve. 
We  show  that  the  reduction  is  better  than  500,000,000 
to  1  for  6-variable  functions.  Although  the  transformation 
produced  by  the  transeunt  triangle  is  generally  accepted 
as  correct,  no  proof  is  known.  We  provide  such  a  proof. 
This  proof  yields  the  reduced  transeunt  triangle,  which 
produces  the  identical  transformation  of  the  full  transeunt 
triangle,  but  with  significantly  fewer  gates  and  less  delay. 
We  show  examples  of  results  obtained  from  this  tool.  For 
other  results,  see  Schneider  [16]  and  Shafer  [17], 

Nonlinearity  is  only  one  type  of  cryptographic  property. 
Other  types  include  strict  avalanche  criterion,  propagation 
criterion,  correlation  immunity,  and  algebraic  immunity. 
There  is  significant  promise  in  exploiting  the  efficiencies 
of  a  reconfigurable  computer  to  make  new  discoveries  in 
these  important  topics. 
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